Skip to content

Add CUDA-aware MPI halo exchange for MPMesh#88

Open
Shahrear2000 wants to merge 1 commit into
SCOREC:dn/Velocity_solver_improvementfrom
Shahrear2000:pmpo_MPMesh_improved
Open

Add CUDA-aware MPI halo exchange for MPMesh#88
Shahrear2000 wants to merge 1 commit into
SCOREC:dn/Velocity_solver_improvementfrom
Shahrear2000:pmpo_MPMesh_improved

Conversation

@Shahrear2000

Copy link
Copy Markdown

Main changes:

  • Adds CUDA-aware MPI communication path guarded by CUDA_AWARE_MPI.
  • Keeps CPU-GPU deep_copy path as fallback.
  • Uses cached metadata and per-neighbor GPU buffers for repeated halo exchanges.

Runtime notes:

  • Requires MPICH_GPU_SUPPORT_ENABLED=1 for CUDA-aware MPI on Perlmutter.
  • Can disable CUDA-aware path at runtime with POLYMPO_DISABLE_CUDA_AWARE_MPI=1.

@cwsmith cwsmith left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see anything that was CUDA specific so I'd suggest replacing 'CUDA' in the function/structure/etc. names with 'GPU' for portability to AMD and Intel GPUs.

It looked like one of the files was mostly formatting changes. Including them in the PR makes it hard to find the critical changes.

I also complained about the function names having a suffix with a number and another with 'improved'.

There was also no test case added for this functionality. Any unit test that exercises this code would be a significant improvement.

Note, I didn't read the gpu mpi exchange code in detail. If feedback at that level is needed I can take a look.

Comment thread src/CMakeLists.txt
)

add_library(polyMPO-core ${SOURCES})
target_compile_definitions(polyMPO-core PUBLIC CUDA_AWARE_MPI)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming this 'GPU_AWARE_MPI' would be more portable (i.e., to AMD and Intel GPUs).

Comment thread src/pmpo_c.cpp
int numVertices = p_mesh->getNumVertices();
auto vtxFieldVel = p_mesh->getMeshField<polyMPO::MeshF_Vel>();
mpMesh->communicate_and_take_halo_contributions1(vtxFieldVel, numVertices, 2, 1, 1);
mpMesh->communicate_and_take_halo_contributions1_improved(vtxFieldVel, numVertices, 2, 1, 1);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest removing the '1' and replacing 'improved' with something meaningful.

Comment thread src/pmpo_MPMesh.hpp
void startCommunication();

void communicate_and_take_halo_contributions(const Kokkos::View<double**>& meshField, int nEntities, int numEntries, int mode, int op);
void communicate_and_take_halo_contributions(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this API still in use?

Comment thread src/pmpo_MPMesh.hpp
//communicateFields1(fieldData1, nEntities, numEntries, mode, recvIDVec, recvDataVec);
communicateFields1(reconVals_host, nEntities, numEntries, mode, recvIDVec, recvDataVec);

communicateFields1(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, adding a number to an API (i.e., the 1 at the end) is just going to create problems in the long term for anyone but the person who wrote them.

Comment thread src/pmpo_MPMesh.hpp
Comment on lines +200 to +203
const ViewType& fieldData,
const int numEntities,
const int numEntries,
int mode,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these formatting only changes should ideally not be part of this PR

Comment thread src/pmpo_MPMesh.hpp
}
}

Kokkos::fence();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this required?

Comment thread src/pmpo_MPMesh.hpp
const int vertex = recvIDGPU(i);

for(int k = 0; k < numEntries; k++){
#ifdef POLYMPO_ASSUME_UNIQUE_HALO_CONTRIBS

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

describing this flag in the docstring for the function would be a good idea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants